Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confluence as source plugin #5404

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

san81
Copy link
Collaborator

@san81 san81 commented Feb 1, 2025

Description

Adding Confluence as a source based on the common SAAS Source-crawler framework. Turns out this PR is a little big in size because I introduced Atlassian common module to store the code related Jira and Confluence at one place. Some of the classes that actually reusable across other saas crawlers are moved to source-crawler The code that is common between Jira and Confluence is the way how we authenticate to Atlassian cloud. In this PR, I didn't touch jira source code yet, not to further increase the PR size. I will give a follow up PR to make use of atlassian-commons in the jira source code

Issues Resolved

Introducing Confluence as a source

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

san81 added 14 commits January 29, 2025 16:38
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>

@Override
public void initCredentials() {
//do nothing for basic authentication
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be default implementation with empty function? That way you do not need it here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Implemented the change

/**
* Initializes the credentials for the Jira instance.
*/
void initCredentials();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding empty default implementation may be helpful here. Something to consider.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Implemented the change

*/
public class Constants {

public static final int RETRY_ATTEMPT = 6;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this be "MAX_RETRIES"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed


@Override
public String getPartitionKey() {
return space + "|" + contentType + "|" + UUID.randomUUID();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this well known convention? If not, why "space"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"space" in the confluence terms, is a name used to indicate set of pages or blogspot. Like for example, all documentation related to product management goes into "Product Management Space"

san81 added 2 commits February 5, 2025 15:02
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
* Exception to indicate unauthorized access.
* It could either be caused by invalid credentials supplied by the user or failed renew the credentials.
*/
public final class UnAuthorizedException extends RuntimeException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnauthorizedException

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

implementation 'com.fasterxml.jackson.core:jackson-databind'
implementation 'javax.inject:javax.inject:1'
implementation 'org.jsoup:jsoup:1.18.3'
implementation("org.springframework:spring-web:${libs.versions.spring.get()}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just move spring-web into the libs file in the settings. Then we can clean these up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to libs

}
}

test {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this. It is inherited.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

testImplementation project(path: ':data-prepper-test-common')
}

test {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this. It is inherited.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

* Jira account url
*/
@JsonProperty("hosts")
protected List<String> hosts;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a list of hosts? It seems the code only supports one host and that would be simpler for users:

host: https://myaccount.atlassian.com

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a recommendation from Raj to support future expansion I guess.

}


public static boolean validateConfig(ConfluenceSourceConfig config) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done in the configuration classes themselves.

* The type Confluence configuration.
*/
@Slf4j
public class ConfluenceConfigHelper {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of these things can be moved into the configuration classes themselves to have consistent access into them.

int total;
int startAt = 0;
do {
ConfluenceSearchResults searchContentItems = confluenceRestClient.getAllContent(cql, startAt);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be some CQL injection issues. You will either need to do proper CQL escaping, or add validations to ensure that the values only have alphanumeric. I think you could write some odd queries by include ) in the queries.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Individual filter items are validated either with a regex or checking against the expected enum values before the placed into the cql construction.

san81 added 3 commits February 8, 2025 23:14
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Signed-off-by: Santhosh Gandhe <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants